专利摘要:
The present invention relates to a DCT core structure performing 8 * 1 DCT / IDCT and 2 * (4 * 1) DCT / IDCT using a DA method, and selecting 8 * 1 and 2 * 4 * 1 DCT or IDCT modes. The input matrix signal x of 8 * 1 DCT and 2 * 4 * 1 DCT / IDCT modes according to the first multiplexer and the DCT / IDCT mode selected by the first multiplexer.0 X7) And (y0~ y7A first butterfly for performing subtraction and subtraction, a second multiplexer for selecting DCT or IDCT mode from the 8 * 1 IDCT input matrix, and a cosine coefficient of the input matrix. An arithmetic means for performing a matrix operation of the signal selected by the second multiplexer and the stored cosine coefficients, and a bit serial for obtaining a cumulative sum of all the bits of a matrix arithmetic result calculated by the arithmetic means for all bits; An adder, a second butterfly for adding and subtracting the cumulative sum of the bit series adder, and a third multiplexer for selectively outputting the output of the bit series adder and the output of the second butterfly.
公开号:KR19980061694A
申请号:KR1019960081068
申请日:1996-12-31
公开日:1998-10-07
发明作者:백대환;류근장;김이섭;백승권
申请人:이우복;사단법인 고등기술연구원연구조합;
IPC主号:
专利说明:

Discrete Cosine Transform Core Structure
The present invention relates to a DCT core used for digital image processing, and more particularly, to a DCT core capable of weighting coefficient processing.
As is well known, the digitalized video signal can be transmitted as a higher quality video image than the analog signal. When a video signal composed of a series of video frames is represented in a digital form, a considerable amount of data is generated to transmit the digital video signal. However, because the available frequency bandwidth of a typical transmission channel is limited, it is necessary to compress the amount of data to be transmitted in order to transmit a large amount of digital data through the limited channel bandwidth.
Since a video signal has any correlation or redundancy between predetermined pixels in a neighboring frame in one frame, it is possible to compress the video signal without seriously adversely affecting the whole of the video signal. Therefore, most conventional video signal encoding methods use various compression techniques developed based on the technical idea of using or eliminating the above-described redundancy.
One category of such coding methods relates to a transformation technique that takes advantage of the redundancy that exists within a frame, and includes a diagonal transformation method for transforming a digital image data block into a transform coefficient, for example, a two-dimensional discrete cosine transform (DCT) coefficient. orthogonal transform method).
In particular, in the above-described diagonal conversion method such as DCT, a video signal of one frame is divided into blocks of the same size that do not overlap, for example, 8 * 8 pixel blocks, each pixel block being a frequency domain in a spatial domain. Is converted to. As a result, each pixelblock has a set of transform coefficients consisting of one DC coefficient and a plurality of (e.g., 63) AC coefficients. These change coefficients represent the amplitude of the frequency component of each pixel in one block, in particular, the DC coefficient of the block has the average brightness of the pixels in the block, while the remaining AC coefficients are the spatial frequency components of each pixel. Indicates the luminance.
In the early days of DCT development, DCT and IDCT were mainly implemented by mathematically changing FETs obtained by rearranging input sequences. Among the new algorithms announced after this method, Chen and Lee's high-speed algorithms contributed to the reduction of complex computational loads and implemented DCT / IDCT using multiplier to be suitable for hardware implementation.
The most time-consuming part of the DCT is the multiplication part, and high-speed algorithms have been proposed to improve the execution speed, or two-dimensional DCT can be performed using low-column decomposition. Attempts have been made to facilitate implementation by calculating the dimensional DCT. However, when implementing this in hardware, there is a disadvantage that a multiplier that takes up a large area must be used.
Therefore, the present invention shares hardware necessary for the calculation of 8 * 8 DCT / IDCT and 2 * 4 * 8 DCT / IDCT using a distributed arithmetic (DA) method that can perform a multiplication operation without using a multiplier. An object of the present invention is to provide a DCT core having a configuration.
Figure 1 shows an 8 * 1 DCT / IDCT algorithm architecture using the DA method.
2 is a 2 * (4 * 1) DCT / IDCT computational architecture using DA scheme
3 is an 8 * 1 and 2 * (4 * 1) DCT / IDCT computational architecture using DA scheme constructed in accordance with the present invention.
* Description of the symbols for the main parts of the drawings *
12, 20, 30: Butterfly 16, 32, 60: ROM
18, 34, 62: bit serial adder
According to the present invention for achieving the above object, the input matrix signal (x0) of the 8 * 1 DCT, 2 * 4 * 1 DCT or 2 * 4 * 1 IDCT mode X7) and a first multiplexer for selecting an 8 * 1 DCT / IDCT or a 2 * 4 * 1 DCT / IDCT mode from (y0 to y7), and the input and output of the input matrix according to the DCT mode selected by the multiplexing means. A first butterfly that performs acid, a second multiplexer that selects a DCT or IDCT mode from the 8x1 IDCT input matrix and the result subtracted and subtracted from the butterfly, and a cosine coefficient of the input matrix Arithmetic means for performing a matrix operation of the signal selected by the second multiplexing means and the stored cosine coefficients, and a bit serial adder for obtaining a cumulative sum for all bits of the matrix arithmetic result calculated by the matrix arithmetic means; And a second butterfly for adding and subtracting the cumulative sum of the bit serial adder, the output of the bit serial adder and the Provided are 8 * 1 and 2 * (4 * 1) DCT / IDCT cores using a DA scheme with a third multiplexer for selectively outputting the output of the second butterfly.
The above and other objects and various advantages of the present invention will become more apparent from the preferred embodiments of the present invention described below with reference to the accompanying drawings.
Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The distributed arithmetic (DA) algorithm performs a DCT / IDCT multiplication operation without using a multiplier when performing DCT / IDCT. This DA method provides an efficient implementation in terms of chip area and speed when implementing DCT / IDCT. Make it possible.
In the DA method, since two-dimensional DCT / IDCT can be separated into two one-dimensional DCT / IDCT, 8 * 8 DCT / IDCT can be divided into two 8 * 1 DCT / IDCT and 2 * 4 * 8 DCT / IDCT It can be implemented separately as 2 * 4 * 1 DCT / IDCT and 8 * 1 DCT / IDCT. 8 * 1 DCT / IDCT is represented by the following [Equation 1].
[Equation 1]
(DCT)
(IDCT)
Where t1.
If Equation 1 is implemented using the DA method, it may be expressed as an 8 * 1 DCT / IDCT matrix equation as shown in Equation 2 below.
[Equation 2]
(DCT)
In the above formula to be.
(IDCT)
In the above formula to be.
1 illustrates an architecture for implementing the above-described 8 * 1 DCT and 8 * 1 IDCT in a DA scheme.
As shown, an 8 * 1 DA architecture typically processes an 8-bit video signal and outputs it as an input matrix (x0 to x7) / (y0 to y7) of 8 * 1 DCT / IDCT. And the first and second butterflies 12 and 20 for performing the subtraction and subtraction of the signals x0 to x7 output from the input processor 10 and the input signals x0 to x7 / y0 to y7. ROM 14 storing stored cosine coefficients and performing a matrix operation of the result of addition or subtraction of (x0 to x7) or the 8 * 1 IDCT input matrix (y0 to y7) and the stored cosine coefficients from ROM 20 and 8 * 1 DCT / IDCT result by processing the cumulative sum of the bit serial adder (BAS: bit serial adder 22 and bit serial adder 22) for calculating the sum of the output bit unit metrics for all bits It includes an output processor 24 for outputting.
When performing the DCT in the DA architecture, the result of performing the matrix operation of the addition and subtraction results of the input signals x0 to x7 and the cosine coefficient is output. On the other hand, when performing IDCT in the DA architecture, the matrix operation of the input signals y0 to y7 and the cosine coefficient is performed, and the result is calculated by adding and subtracting.
That is, when performing 8 * 1 DCT, the signals x0 to x7 passing through the first butterfly 12 are selected by the 2 * 1 multiplexer 14, and when performing 8 * 1 IDCT, the second The signal passing through the butterfly 20 is selected by the 2 * 1 multiplexer 22. The DCT / IDCT selection signals required by each 2 * 1 multiplexer 12 and 22 are provided by a control unit (not shown) that controls the DCT / IDCT core.
Meanwhile, 2 * 4 * 1 DCT / IDCT means to simultaneously perform 4 * 1 DCT / IDCT twice. In the case of digital VCR, one of two modes is selectively performed by analyzing 8 * 8 video signals to determine whether to perform 8 * 8 DCT / IDCT or 2 * 4 * 8 DCT / IDCT. When performing 2 * 4 * 8 DCT, first perform 2 * 4 * 1 DCT / IDCT. In this case, 4 * 1 DCT / IDCT is performed by calculating the sum of different fields, and at the same time, 4 * 1 DCT / IDCT is performed by calculating the difference between different fields.
Since 4 * 1 DCT / IDCT is expressed as shown in [Equation 3], it is expressed as [Equation 4] to implement 2 * 4 * 1 DCT / IDCT in the DA method.
[Equation 3]
(DCT)
(IDCT)
In the above formula to be.
[Equation 4]
(DCT)
In the above formula to be.
(IDCT)
In the above formula to be.
2 illustrates an architecture for implementing the above-described 8 * 1 DCT and 8 * 1 IDCT in a DA scheme.
As shown, a 2 * (4 * 1) DA architecture typically processes video signals in units of 8 bits, thus providing (x0 to x7) / (y0 to y7) of 2 * (4 * 1) DCT / IDCT mode. The input processor 26 to output, the butterfly 28 which performs addition and subtraction of the signals (x0 to x7) and (y0 to y7) output from the input processor 26, the input signals (x0 to x7) and ( Cosine coefficients of y0 to y7 are stored, and bits 32 and 32 outputted from the ROM 32 and matrix 32 perform matrix operations of the result of addition and subtraction of (x0 to x7) and (y0 to y7) and the stored cosine coefficients. A bit serial adder (BAS) 34 and a bit serial adder 34 for calculating the cumulative sum of the result of the matrix operation of a unit for all bits are processed to obtain a 2 * (4 * 1) DCT / IDCT result. An output processor 36 for outputting.
2 * (4 * 1) DCT or 2 * (4 * 1) IDCT operations are selected in the 2 * 1 multiplexer 28. When a 2 * (4 * 1) DCT or 2 * (4 * 1) IDCT operation is selected in the multiplexer 28, the cosine coefficients of (x0 to x7) or (y0 to y7) are computed in the ROM 32. The bit serial adder 34 then calculates the sum of the bitwise matrix operations for the mode bits. The cumulative sum obtained by the bit serializer 34 is processed by the output processor 36 and output as the result of the 2 * (4 * 1) DCT / IDCT operation. The selection signal required by the 2 * 1 multiplexer 28 is provided by a controller (not shown) external to the DCT / IDCT core.
Referring to FIG. 3, there is shown a configuration of 8 * 1 DCT / IDCT and 2 * (4 * 1) DCT / IDCT shared DCT cores constructed in accordance with the present invention.
The 8 * 1 DCT / IDCT and 2 * (4 * 1) DCT / IDCT shared DCT cores of the present invention process an 8-bit unit of the video signal to the 8 * 1 DCT / IDCT and 2 * 4 * 1 DCT / IDCT modes. An input processor 52 outputting as input metrics (x0 to x7) and (y0 to y7), and 8 * 1 DCT, 2 * (4 * 1) DCT or 2 * (4 *) output from the input processor 52 1) 3 * 1 multiplexer 54, which selects the input matrix signal in IDCT mode, adds and subtracts in the butterfly 56 and butterfly 56, which performs addition and subtraction of the input matrix selected by the 3 * 1 multiplexer 54. The result and the cosine coefficients of 2 * 1 multiplexer (58), (x0 to x7), and (y0 to y7) to select the input matrix in 8 * 1 IDCT mode, and store the selected signal from the 2 * 1 multiplexer (58). And ROM 60 for performing matrix operation on the cosine coefficients stored therein, and bit series for obtaining a cumulative sum of the matrix operation results in units of bits output from the ROM 60 for all bits. An adder (BAS) 34, and a bit 2 * Treat the cumulative sum of the serial adder 34 (4 * 1) output processor 36 for outputting a DCT / IDCT result.
Whether to perform 8 * 1 DCT / IDCT or 2 * (4 * 1) DCT / IDCT is determined by the DCT / IDCT selection control signal provided to the multiplexer 58 outside of the DCT / IDCT core, so the 3 * 1 multiplexer The output of the 3 * 1 multiplexer 54 is determined using the 8 * 1 or 2 * (4 * 1) mode selection control signal and the DCT / IDCT selection control signal provided to 54.
In this way, the butterfly 56 commonly used in 8 * 1 DCT and 2 * (4 * 1) DCT IDCTs can be shared. The rear 2 * 1 multiplexer 66 selects the output of the rear 64 butterfly 64 only when performing 8 * 1 IDCT.
For 8 * 8 DCT / IDCT, two 8 * 1 DCT / IDCTs and matrix conversion circuit are required, for 2 * 4 * 8 DCT / IDCT, one 2 * 4 * 1 DCT / IDCT and one 8 * 1 DCT / IDCT and matrix conversion circuit are required. Thus, using the 8 * 1/2 * (4 * 1) DCT / IDCT shared circuit described earlier, one 8 * 1/2 * (4 * 1) DCT / IDCT shared circuit and one 8 * 1 DCT / IDCT And matrix transformation circuits.
As described above, unlike the DCT core structure that implements DCT / IDCT using the multiplier according to the present invention, hardware is implemented by using 8 * 1 DCT / IDCT and 2 * 4 * 1 DCT / IDCT DA structures in common. It can provide the advantage of minimizing the chip area.
权利要求:
Claims (1)
[1" claim-type="Currently amended] In a DCT core performing 8 * 1 DCT / IDCT and 2 * (4 * 1) DCT / IDCT using a DA method,
Input processing means for processing an image signal in units of 8 bits and outputting as input metrics (x0 to x7) and (y0 to y7) in 8 * 1 DCT / IDCT and 2 * 4 * 1 DCT / IDCT modes;
8 * 1 DCT / IDCT or 2 from 8 * 1 DCT, 2 * 4 * 1 DCT or 2 * 4 * 1 IDCT mode input matrix signals (x0 to x7) and (y0 to y7) output from the input processing means. First multiplexing means for selecting a DCT / IDCT mode;
A butterfly for performing addition and subtraction of the input matrix input according to the DCT / IDCT mode selected by the multiplexing means;
Second multiplexing means for selecting a DCT or IDCT mode from the result subtracted and subtracted from the butterfly and the 8 * 1 IDCT input matrix;
Matrix arithmetic means for storing a cosine coefficient of the input matrix and performing a matrix operation of the signal selected by the second multiplexing means and the stored cosine coefficient;
A bit serial adder for obtaining a cumulative sum of all the bits of the matrix operation result of the bit unit calculated by the matrix calculation means;
A second butterfly for adding and subtracting the cumulative sum of the bit series adders;
8 * 1 DCT / IDCT and 2 * (4 * 1) DCT / IDCT shared DCT cores using a DA scheme having a third multiplexing means for selectively outputting the output of the bit serializer and the output of the second butterfly .
类似技术:
公开号 | 公开日 | 专利标题
US9503737B2|2016-11-22|Low-complexity two-dimensional | separable transform design with transpose buffer management
JP5384696B2|2014-01-08|Adaptive block size DCT image compression based on variance
Konstantinides et al.1997|Noise estimation and filtering using block-based singular value decomposition
CA1294053C|1992-01-07|High speed cosine transform
US5546477A|1996-08-13|Data compression and decompression
US6351570B1|2002-02-26|Image coding and decoding apparatus, method of image coding and decoding, and recording medium for recording program for image coding and decoding
ES2266665T3|2007-03-01|Round control for multietapa interpolation.
DK3121966T3|2018-03-12|Joint quantization and integration transformation normalization using a mantisse exponent representation of a quantization parameter
ES2278087T3|2007-08-01|approximate bicubic filter.
US6292589B1|2001-09-18|Method for choosing rate control parameters in motion-compensated transform-based picture coding scheme using non-parametric technique
US7006698B2|2006-02-28|Method and apparatus for compressing a video image
KR100762155B1|2007-10-01|Low complexity and unified transforms for video coding
KR950004117B1|1995-04-25|Orthogonal transform coding apparatus
EP0720379B1|2003-10-29|Encoding method and device therefor
KR100262236B1|2000-07-15|Method and system for three-dimensional compression of digital video systems
JP4006047B2|2007-11-14|Image reduction sampling method
KR100584495B1|2006-06-02|Method and arrangement for video coding
US5659363A|1997-08-19|Coding and decoding of video signals
EP0808068B1|2004-08-25|Methods and apparatus for removing blocking effect in a motion picture decoder
US7242713B2|2007-07-10|2-D transforms for image and video coding
DE69922486T2|2005-11-03|Post-editing of decompressed pictures
JP3716931B2|2005-11-16|Adaptive decoding device for continuous images
AU656489B2|1995-02-02|Coding system and method
JP5623565B2|2014-11-12|Apparatus and method for encoding and calculating a discrete cosine transform using a butterfly processor
JP3145403B2|2001-03-12|Adaptive block size image compression method and system
同族专利:
公开号 | 公开日
KR100225496B1|1999-10-15|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题
法律状态:
1996-12-31|Application filed by 이우복, 사단법인 고등기술연구원연구조합
1996-12-31|Priority to KR1019960081068A
1998-10-07|Publication of KR19980061694A
1999-10-15|Application granted
1999-10-15|Publication of KR100225496B1
优先权:
申请号 | 申请日 | 专利标题
KR1019960081068A|KR100225496B1|1996-12-31|1996-12-31|Dct core architecture|
[返回顶部]